Formant frequency tracking using Gaussian mixtures with maximum a posteriori adaptation
نویسندگان
چکیده
We present a novel method for estimating formant frequencies by fitting Gaussian mixtures to discrete Fourier Transform (DFT) magnitude spectra. The method first estimates the Gaussian parameters for a sequence of wideband spectra using the Expectation-Maximization (EM) algorithm. It then refines the parameters by using maximum a posteriori (MAP) adaptation. The work was evaluated using manually labeled ground truth data with 516 utterances and comparing results both with PRAAT’s formant tracking algorithm in various noisy environments and one other state-of-the-art method. We obtained statistically significant improvements in the relative errors for the first three formants over all phonetic classes.
منابع مشابه
Formant Prediction from MFCC Vectors
This work proposes a novel method of predicting formant frequencies from a stream of mel-frequency cepstral coefficients (MFCC) feature vectors. Prediction is based on modelling the joint density of MFCC vectors and formant vectors using a Gaussian mixture model (GMM). Using this GMM and an input MFCC vector, two maximum a posteriori (MAP) prediction methods are developed. The first method pred...
متن کاملPredicting Formant Frequencies from MFCC Vectors
This work proposes a novel method of predicting formant frequencies from a stream of mel-frequency cepstral coefficients (MFCC) feature vectors. Prediction is based on modelling the joint density of MFCCs and formant frequencies using a Gaussian mixture model (GMM). Using this GMM and an input MFCC vector, two maximum a posteriori (MAP) prediction methods are developed. The first method predict...
متن کاملFormant frequency prediction from MFCC vectors in noisy environments
This paper proposes a method of predicting the formant frequencies of a frame of speech from its mel-frequency cepstral coefficient (MFCC) representation. Prediction is achieved through the creation of a Gaussian mixture model (GMM) which models the joint density of formant frequencies and MFCCs. Using this GMM and an input MFCC vector, a maximum a posteriori (MAP) prediction of the formant fre...
متن کاملAdaptation of children's speech with limited data based on formant-like peak alignment
Automatic recognition of children s speech using acoustic models trained by adults results in poor performance due to differences in speech acoustics. These acoustical differences are a consequence of children having shorter vocal tracts and smaller vocal cords than adults. Hence, speaker adaptation needs to be performed. However, in real-world applications, the amount of adaptation data availa...
متن کاملHMM-based MAP Prediction o Formant Frequencies from N
This paper describes how formant frequencies of voiced and unvoiced speech can be predicted from mel-frequency cepstral coefficients (MFCC) vectors using maximum a posteriori (MAP) estimation within a hidden Markov model (HMM) framework. Gaussian mixture models (GMMs) are used to model the local joint density of MFCCs and formant frequencies. More localised prediction is achieved by modelling s...
متن کامل